翻訳と辞書 |
Robust simple linear regression : ウィキペディア英語版 | Theil–Sen estimator
In non-parametric statistics, there is a method for robust simple linear regression that chooses the median slope among all lines through pairs of two-dimensional sample points. It has been called the Theil–Sen estimator, Sen's slope estimator,〔.〕〔 slope selection,〔〔 the single median method,〔.〕 the Kendall robust line-fit method,〔; .〕 and the Kendall–Theil robust line. It is named after Henri Theil and Pranab K. Sen, who published papers on this method in 1950 and 1968 respectively. It can be computed efficiently, and is insensitive to outliers; it can be significantly more accurate than non-robust simple linear regression for skewed and heteroskedastic data, and competes well against non-robust least squares even for normally distributed data in terms of statistical power.〔 It has been called "the most popular nonparametric technique for estimating a linear trend".〔.〕 ==Definition== As defined by , the Theil–Sen estimator of a set of two-dimensional points is the median of the slopes determined by all pairs of sample points. extended this definition to handle the case in which two data points have the same -coordinate. In Sen's definition, one takes the median of the slopes defined only from pairs of points having distinct -coordinates. Once the slope has been determined, one may determine a line from the sample points by setting the -intercept to be the median of the values .〔 As Sen observed, this estimator is the value that makes the Kendall tau rank correlation coefficient comparing the values of with the residual for the ''i''-th observation become approximately zero.〔.〕 A confidence interval for the slope estimate may be determined as the interval containing the middle 95% of the slopes of lines determined by pairs of points,〔For determining confidence intervals, pairs of points must be sampled with replacement; this means that the set of pairs used in this calculation includes pairs in which both points are the same as each other. These pairs are always outside the confidence interval, because they do not determine a well-defined slope value, but using them as part of the calculation causes the confidence interval to be wider than it would be without them.〕 and may be estimated quickly by sampling pairs of points and determining the 95% interval of the sampled slopes. According to simulations, approximately 600 sample pairs are sufficient to determine an accurate confidence interval.〔.〕
抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Theil–Sen estimator」の詳細全文を読む
スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース |
Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.
|
|